Goto

Collaborating Authors

 penalty factor






LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization

Zhang, Erica, Goto, Ryunosuke, Sagan, Naomi, Mutter, Jurik, Phillips, Nick, Alizadeh, Ash, Lee, Kangwook, Blanchet, Jose, Pilanci, Mert, Tibshirani, Robert

arXiv.org Machine Learning

We introduce LLM-Lasso, a novel framework that leverages large language models (LLMs) to guide feature selection in Lasso $\ell_1$ regression. Unlike traditional methods that rely solely on numerical data, LLM-Lasso incorporates domain-specific knowledge extracted from natural language, enhanced through a retrieval-augmented generation (RAG) pipeline, to seamlessly integrate data-driven modeling with contextual insights. Specifically, the LLM generates penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model, while less relevant features are assigned higher penalties, reducing their influence. Importantly, LLM-Lasso has an internal validation step that determines how much to trust the contextual knowledge in our prediction pipeline. Hence it addresses key challenges in robustness, making it suitable for mitigating potential inaccuracies or hallucinations from the LLM. In various biomedical case studies, LLM-Lasso outperforms standard Lasso and existing feature selection baselines, all while ensuring the LLM operates without prior access to the datasets. To our knowledge, this is the first approach to effectively integrate conventional feature selection techniques directly with LLM-based domain-specific reasoning.


Predict+Optimize for Packing and Covering LPs with Unknown Parameters in Constraints

Hu, Xinyi, Lee, Jasper C. H., Lee, Jimmy H. M.

arXiv.org Artificial Intelligence

Predict+Optimize is a recently proposed framework which combines machine learning and constrained optimization, tackling optimization problems that contain parameters that are unknown at solving time. The goal is to predict the unknown parameters and use the estimates to solve for an estimated optimal solution to the optimization problem. However, all prior works have focused on the case where unknown parameters appear only in the optimization objective and not the constraints, for the simple reason that if the constraints were not known exactly, the estimated optimal solution might not even be feasible under the true parameters. The contributions of this paper are two-fold. First, we propose a novel and practically relevant framework for the Predict+Optimize setting, but with unknown parameters in both the objective and the constraints. We introduce the notion of a correction function, and an additional penalty term in the loss function, modelling practical scenarios where an estimated optimal solution can be modified into a feasible solution after the true parameters are revealed, but at an additional cost. Second, we propose a corresponding algorithmic approach for our framework, which handles all packing and covering linear programs. Our approach is inspired by the prior work of Mandi and Guns, though with crucial modifications and re-derivations for our very different setting. Experimentation demonstrates the superior empirical performance of our method over classical approaches.


How do Kernel Regularizers work with neural networks?

#artificialintelligence

Regularization is the process of fine-tuning neural network models by inducing a penalty term in the error parameter to obtain an optimal and reliable model which converges better with minimal loss during testing and performs better for unseen data. Regularization helps us get a more generic and reliable model which functions well with respect to changes in patterns of data and any possible uncertainties. So in this article let us see how kernel regularizers work with neural networks and place at what layers of the neural networks are useful to obtain optimal neural networks. Regularization is the process of adding penalty factors to the network layers to alter the weight propagation through the layers which facilitate the model to converge optimally. There are mainly two types of penalties that can be enforced on the network layers which are named as L1 regularization considers the weight of the layers as it is while the L2 regularization considers the squares of weights.


Neural Pruning via Growing Regularization

Wang, Huan, Qin, Can, Zhang, Yulun, Fu, Yun

arXiv.org Artificial Intelligence

Regularization has long been utilized to learn sparsity in deep neural network pruning. However, its role is mainly explored in the small penalty strength regime. In this work, we extend its application to a new scenario where the regularization grows large gradually to tackle two central problems of pruning: pruning schedule and weight importance scoring. (1) The former topic is newly brought up in this work, which we find critical to the pruning performance while receives little research attention. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains compared with its one-shot counterpart, even when the same weights are removed. (2) The growing penalty scheme also brings us an approach to exploit the Hessian information for more accurate pruning without knowing their specific values, thus not bothered by the common Hessian approximation problems. Empirically, the proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning. Their effectiveness is demonstrated with modern deep neural networks on the CIFAR and ImageNet datasets, achieving competitive results compared to many state-of-the-art algorithms. Our code and trained models are publicly available at https://github.com/mingsuntse/regularization-pruning.


Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets

Karagulyan, Avetik, Dalalyan, Arnak S.

arXiv.org Machine Learning

We study the problem of sampling from a probability distribution on $\mathbb R^p$ defined via a convex and smooth potential function. We consider a continuous-time diffusion-type process, termed Penalized Langevin dynamics (PLD), the drift of which is the negative gradient of the potential plus a linear penalty that vanishes when time goes to infinity. An upper bound on the Wasserstein-2 distance between the distribution of the PLD at time $t$ and the target is established. This upper bound highlights the influence of the speed of decay of the penalty on the accuracy of the approximation. As a consequence, considering the low-temperature limit we infer a new nonasymptotic guarantee of convergence of the penalized gradient flow for the optimization problem.


Feature-weighted elastic net: using "features of features" for better prediction

Tay, J. Kenneth, Aghaeepour, Nima, Hastie, Trevor, Tibshirani, Robert

arXiv.org Machine Learning

In some supervised learning settings, the practitioner might have additional information on the features used for prediction. We propose a new method which leverages this additional information for better prediction. The method, which we call the feature-weighted elastic net ("fwelnet"), uses these "features of features" to adapt the relative penalties on the feature coefficients in the elastic net penalty. In our simulations, fwelnet outperforms the lasso in terms of test mean squared error and usually gives an improvement in true positive rate or false positive rate for feature selection. We also apply this method to early prediction of preeclampsia, where fwelnet outperforms the lasso in terms of 10-fold cross-validated area under the curve (0.86 vs. 0.80). We also provide a connection between fwelnet and the group lasso and suggest how fwelnet might be used for multi-task learning.